MENU

This topic will discuss the mathematics of Eigenvectors and Eigenvalues, Spectral Decomposition Theorem, and Principal Component Analysis. Moreover, I will show an application of this technique.

EIGENVECTORS AND EIGENVALUES

Definition 1. Let a non-null vector v (i.e. \vec{v} \neq 0), v \in V (vector space) is said a eigenvectors of T if there exist a real number \lambda which T(v) = \lambda v. The \lambda scalar is denominated by an eigenvalue of T associated with v. It can be concluded that v and T have the same support line (i.e. the linear transformation T maps vector v into the same direction):

\begin{aligned}
  T: v \rightarrow v \\[5px]
  \Rightarrow
  T(\vec{v}) = \lambda \vec{v} \\[5px]
  \Rightarrow A\vec{v} = \lambda\vec{v}
  \end{aligned}

Preposition 1. The eigenvectors of A are precisely the roots of the characteristic polynomial of A.

A \in \mathcal{M}_{p}(\mathbb{R}) or A \in \mathbb{R}^{p \times p} of transformation T: \mathbb{R}^{p} \rightarrow \mathbb{R}^{p}. Therefore

\lambda \in \mathbb{R} is eigenvalues of A.

\Leftrightarrow \exists \vec{v} \neq 0 such that A\vec{v} = \lambda\vec{v}

\Leftrightarrow A\vec{v} - \lambda\vec{v} = 0 where AI = A

\Leftrightarrow (A - \lambda I)\vec{v} = 0

in which I is identity matrix. So this system has a non-trivial solution if, and only if, the determinant of the matrix (A - \lambda I) is zero (a consequence of Cramer’s Rule),

\begin{cases}
      \vec{v} \neq 0\\
      \det(A - \lambda I) = 0
    \end{cases}

In this manner, (A - \lambda I) is non-invertible and singular. If \vec{v} \neq 0(A - \lambda I)^{-1} = 0, so (A - \lambda I) is invertible it has a trivial solution and is not what we want!

Example: Let A is a matrix, and now we will find its eigenvectors and eigenvalues.

\begin{bmatrix}
 2 & -1 \\
-1 &  2 
\end{bmatrix}

and knowing that det(A - \lambda I) =0. Then,

\begin{bmatrix}
 2 - \lambda & -1 \\
-1 &  2 - \lambda 
\end{bmatrix} = (2 - \lambda)(2 - \lambda) - 1 = 0 \lambda^2 - 4 \lambda + 3 = 0 (\lambda - 3)(\lambda - 1) = 0

The solutions for the eigenvalues are \lambda = (3, 1)^{\top}. For the eigenvector equal to \lambda_{1} = 3, we have

\begin{bmatrix}
 2 - 3 & -1 \\
-1 &  2 - 3 
\end{bmatrix} = \begin{bmatrix}
 v_{1}  \\
 v_{2}
\end{bmatrix} = \vec{0}

\begin{bmatrix}
 -1 & -1 \\
-1 &  -1 
\end{bmatrix} = \begin{bmatrix}
 v_{1}  \\
 v_{2}
\end{bmatrix} = \vec{0}

\begin{cases}
      - v_{1} - v_{2} = 0\\
      - v_{1} - v_{2} = 0
    \end{cases} \Rightarrow  v_{1} = - v_{2} (- v_{2}, v_{2})^{\top} = v_{2}(- 1, 1)^{\top} \lambda = 3 \sim \text{span}\bigg\{\begin{bmatrix}
 -1  \\
 1
\end{bmatrix}\bigg\} \Rightarrow \vec{v} = (-1, 1)^{\top}

For the eigenvector equal to \lambda_{1} = 1, we have

\begin{bmatrix}
 2 - 1 & -1 \\
-1 &  2 - 1 
\end{bmatrix} = \begin{bmatrix}
 v_{1}  \\
 v_{2}
\end{bmatrix} = \vec{0}

\begin{bmatrix}
 1 & -1 \\
-1 &  1 
\end{bmatrix} = \begin{bmatrix}
 v_{1}  \\
 v_{2}
\end{bmatrix} = \vec{0}

\begin{cases}
        v_{1} - v_{2} = 0\\
      - v_{1} + v_{2} = 0
    \end{cases} \Rightarrow  v_{1} = v_{2} ( v_{2}, v_{2})^{\top} = v_{2}(1, 1)^{\top} \lambda = 1 \sim \text{span}\bigg\{\begin{bmatrix}
 1  \\
 1
\end{bmatrix}\bigg\} \Rightarrow \vec{v} = (1, 1)^{\top}

and normalizing the vectors to length 1, known as the unit vector; they are

\frac{\vec{v}_{1}}{||\vec{v}_{1}||} = \begin{bmatrix}
 -1/\sqrt{2} \\
 1/\sqrt{2}
\end{bmatrix}

\frac{\vec{v}_{2}}{||\vec{v}_{2}||} = \begin{bmatrix}
 1/\sqrt{2} \\
 1/\sqrt{2}
\end{bmatrix}

SPECTRAL DECOMPOSITION THEOREM

Reminder: I will discuss only the Spectral Decomposition Theorem, but other methods could be used as Singular value decomposition.

An asymmetric matrix is a matrix that is equal to its transpose. Thus,

A \in \mathcal{M}_{p}(\mathbb{R}) \Rightarrow A^{\top} \in \mathcal{M}_{p}(\mathbb{R}) \Rightarrow A = A^{\top}

Spectral Theorem (Symmetric matrix): Let A \in \mathbb{R}^{p \times p}, then

  1. The eigenvalues \lambda_{1}, \dots, \lambda_{p} are real;
  2. The eigenvectors of A associated with different eigenvalues are orthogonal to each other. Then:

    \begin{aligned}
<\vec{v}_{i}, \vec{v}_{j}> &= \vec{v}_{i}^{\top} \vec{v}_{j} \\[10pt]
 &=  \sum_{k=1}^{p} \vec{v}_{ki}\vec{v}_{kj}  \\[10pt]
 &= \vec{v}_{1i}\vec{v}_{1j} + \dots+ \vec{v}_{pi}\vec{v}_{pj} \\[10pt]
 &= 0
\end{aligned}

    with \vec{v}_{j} = (v_{1j}, \dots, v_{pj})^{\top} \in \mathbb{R}^{p \times 1}, i \neq j and i, j, k = 1, \dots, p;

    Preposition 2. An orthogonal set of non-zero vectors is linearly independent.

    Reminder:

    Linearly independent \nrightarrow Orthogonal

    Linearly independent \leftarrow Orthogonal

  3. It is always possible to obtain a basis \{e_{1}, \dots, e_{p}\} of orthonormal eigenvectors. Then:

    \begin{matrix}
 \cfrac{\vec{v}_{1}}{||\vec{v}_{1}||} = (e_{11}, e_{21}, \dots, e_{p1})^{\top} = e_{1} \\
  \cfrac{\vec{v}_{2}}{||\vec{v}_{2}||} = (e_{12}, e_{22}, \dots, e_{p2})^{\top} = e_{2} \\
  \vdots\\
 \cfrac{\vec{v}_{p}}{||\vec{v}_{p}||} = (e_{1p}, e_{2p}, \dots, e_{pp})^{\top} = e_{p}
\end{matrix}

    e_{i}^{\top} e_{j} = \delta_{ij} =   \begin{cases}
      0 & \text{if } i \neq j\\
      1 & \text{if } i = j
    \end{cases}

    and i, j = 1, \dots, p;

  4. The relation is valid P^{-1}AP = D, so P is defined as follows

    P_{p \times p} =\begin{bmatrix}
 e_{.1} & \dots & e_{.p} \\
\end{bmatrix} = 
\begin{bmatrix}
 e_{11} &  e_{12} & \dots & e_{1p} \\
 e_{21} &  e_{22} & \dots & e_{2p} \\
 \vdots &  \vdots & \ddots & \vdots \\
 e_{p1} &  e_{p2} & \dots & e_{pp} \\
\end{bmatrix}

    where P is a orthonormal eigenvectors matrix.

    Reminder: Orthonormal eigenvectors matrix has the following properties P^{\top}P = I = PP^{\top} and P^{-1} = P^{\top}.

    and D matrix is defined by

    D_{p \times p} =
\begin{bmatrix}
\lambda_{1} &         &       & \\
        &  \lambda_{2} &       & \\
        &         &\ddots & \\
        &         &       & \lambda_{p}\\
\end{bmatrix}

    where D is a diagonal matrix in which eigenvalues are diagonal elements;

Definition 2. A quadratic form in \mathbb{R}^{n} is a function Q: \mathbb{R}^{p} \rightarrow \mathbb{R} of form Q(y) = y^{\top}Ay.

  1. A is positive definite \Leftrightarrow If Q(y) > 0, \forall y \in \mathbb{R}_{\small\setminus0}^{p}.
  2. A is positive semidefinite \Leftrightarrow If Q(y) \geq 0, \forall y \in \mathbb{R}_{\small\setminus0}^{p}.

Hence

  1. A is positive definite, so \lambda_{j} > 0 for all j = 1, \dots, p.
  2. A is positive semidefinite, so \lambda_{j} \geq 0 for all j = 1, \dots, p.

Important: If A \in \mathbb{R}^{p \times p} and positive definite.

  1. A is non-singular (\text{det}(A) > 0). Exemplo:

    A = \begin{bmatrix}
            a & b \\
            b & d \\
          \end{bmatrix} \Rightarrow \text{det}(A) = ad - bb

    \Rightarrow A^{-1} = \cfrac{1}{\text{det}(A)}\begin{bmatrix}
            d & - b \\
            - b & a \\
          \end{bmatrix}

  2. A^{-1} is positive definite.

If A \in \mathbb{R}^{p \times p} and positive semidefinite.

  1. All elements on the main diagonal are non-negative and \mathrm{tr}(A) \geq 0.
  2. All principal minors A are non-negative.

COVARIANCE MATRIX

The covariance matrix is

\Sigma = \mathrm{E}((X - \mathrm{E}(X))(X - \mathrm{E}(X))^{\top})

where X is a matrix of random variables from a sample; x_{l} = (x_{l1}, \dots, x_{lp})^{\top} is a row vector of X with l = 1, \dots, n; and sample mean vector

\overline{x} = \sum_{l = 1}^{n}\cfrac{x_{l}}{n} = (\overline{x}_{1}, \dots, \overline{x}_{p})^{\top}

Therefore, the sample covariance matrix is

S = \cfrac{1}{n - 1} \sum_{l = 1}^{n} (x_{l} - \overline{x})(x_{l} - \overline{x})^{\top}

Now let Z_{p \times 1} = (z_{1}, \dots, z_{p})^{\top} \in \mathbb{R}_{\small\setminus0}^{p} be any column vector. Thus

Z^{\top}SZ = Z^{\top}\bigg( \cfrac{1}{n - 1} \sum_{l = 1}^{n} (x_{l} - \overline{x})(x_{l} - \overline{x})^{\top} \bigg)Z

how Z does not dependent on l, we can move into the summation

Z^{\top}SZ = \bigg( \cfrac{1}{n - 1} \sum_{l = 1}^{n} \underset{1 \times p}{Z^{\top}}\underset{p \times 1}{(x_{l} - \overline{x})}\underset{1 \times p}{(x_{l} - \overline{x})^{\top}}\underset{p \times 1}{Z} \bigg)

Z^{\top}SZ = \cfrac{1}{n - 1} \sum_{l = 1}^{n} \Big( (x_{l} - \overline{x})^{\top}Z \Big)^{2} \geq 0

Thereby, the covariance matrix will always be a positive semidefinite matrix. Notice that Z^{\top}SZ will only be zero whether (x_{l} - \overline{x})^{\top}Z = 0 for all l = 1, \dots, n.

Another point that you need to take in mind is \text{det}(S) = 0, which means Z^{\top}SZ = 0; in this situation, S is a singular matrix, i.e., there is no inverse. In other words, S is ill-conditioned, and in my opinion, you need to begin trying if there is a collinearity problem. Or we can use Moore-Penrose generalized inverse to have a pseudoinverse matrix.

PRINCIPAL COMPONENT ANALYSIS

The PCA is a technique that will find directions in which the projection of data has the largest variance. The first direction is, by definition, called the first principal direction. To find the first one consider the following optimization problem e_{1} \in \mathbb{R}^{p}

\begin{aligned}
& \underset{e_{1} \neq 0}{\text{max}}
& & \mathrm{Var}(X e_{1}) = e_{1}^{\top}Se_{1} \\
& \text{subject to} & & ||e_{1}||^{2} = 1,  \\
\end{aligned}

Using the formalization of the Lagrange multiplier method to e_{1}, we have

L = e_{1}^{\top}Se_{1} - \lambda_{1}(e_{1}^{\top}e_{1} - 1)

\begin{aligned}
\frac{\partial L}{\partial e_{1}} &= 2Se_{1} - 2\lambda_{1}e_{1} \Leftrightarrow 
e_{1}^{\top}Se_{1} = \lambda_{1} \\
 \\
\frac{\partial L}{\partial \lambda_{1}} &= e_{1}^{\top}e_{1} - 1 = 0  \Leftrightarrow e_{1}^{\top}e_{1} = 1\\
\end{aligned}

It is not tough to see that \lambda_{1} is the maximum variance for \underset{ n \times 1}{Y_{1}} = \underset{ n \times p}{(X - \overline{x})} \underset{ p \times 1}{e_{1}}.

Now, we need to find the second direction called the second principal direction. Consider the following optimization problem e_{2} \in \mathbb{R}^{p}

\begin{aligned}
& \underset{e_{2} \neq 0}{\text{max}}
& & \mathrm{Var}(X e_{2}) = e_{2}^{\top}Se_{2} \\
& \text{subject to} & & ||e_{2}||^{2} = 1,  \\
&                   & & e_{1}^{\top}e_{2} = 0, 
\end{aligned}

Using the formalization of the Lagrange multiplier method, we have

L = e_{2}^{\top}Se_{2} - \lambda_{2}(e_{2}^{\top}e_{2} - 1) - \phi e_{2}^{\top}e_{1}

\begin{aligned}
\frac{\partial L}{\partial e_{2}} &= 2Se_{2} - 2\lambda_{2}e_{2} - \phi e_{1} = 0 \\
 \\
\frac{\partial L}{\partial \lambda_{2}} &= e_{2}^{\top}e_{2} - 1 = 0 \Leftrightarrow  e_{2}^{\top}e_{2} = 1\\
\\
\frac{\partial L}{\partial \phi} &=  e_{2}^{\top}e_{1} = 0  \\
\end{aligned}

So, we are going to find the value of \lambda_{2} and \phi; then the first one will be \phi

\begin{aligned}
2Se_{2} - 2\lambda_{2}e_{2} - \phi e_{1} &= 0 \\
2e_{1}^{\top}Se_{2} - 2\lambda_{2}e_{1}^{\top}e_{2} - \phi e_{1}^{\top}e_{1} &= 0, \ (e_{1}^{\top}e_{1} = 1, e_{1}^{\top}e_{2} = 0, \ \mbox{and} \ e_{1}^{\top}S = \lambda_{1}e_{1}^{\top} \ )\\
2\lambda_{1}e_{1}^{\top}e_{2} - \phi &= 0, \ (e_{1}^{\top}e_{2} = 0)\\
\phi &= 0
\end{aligned}

and to \lambda_{2}

\begin{aligned}
\frac{\partial L}{\partial e_{2}} &= 2Se_{2} - 2\lambda_{2}e_{2} - \phi e_{1} = 0 \Leftrightarrow  e_{2}^{\top}Se_{2} = \lambda_{2}\\
 \\
\end{aligned}

Now, \lambda_{2} is the second maximum variance for \underset{ n \times 1}{Y_{2}} = \underset{ n \times p}{(X - \overline{x})} \underset{ p \times 1}{e_{2}}.

The results above can be extrapolated to k principal components, where k \leq p. Then

\begin{aligned}
\underset{ n \times 1}{Y_{1}} = \underset{ n \times p}{(X - \overline{x})} \underset{ p \times 1}{e_{1}} &\Leftrightarrow  \mathrm{Var}(Xe_{1}) = e_{1}^{\top}Se_{1} = \lambda_{1}\\
\underset{ n \times 1}{Y_{2}} = \underset{ n \times p}{(X - \overline{x})} \underset{ p \times 1}{e_{2}} &\Leftrightarrow  \mathrm{Var}(Xe_{2}) = e_{2}^{\top}Se_{2} = \lambda_{2}\\
 \vdots \\
 \underset{ n \times 1}{Y_{k}} = \underset{ n \times p}{(X - \overline{x})} \underset{ p \times 1}{e_{k}} &\Leftrightarrow  \mathrm{Var}(Xe_{k}) = e_{k}^{\top}Se_{k} = \lambda_{k}\\
\end{aligned}

where \lambda_{1} \geq \lambda_{2} \geq \dots \geq \lambda_{k}.

REFERENCES

Agresti, Alan. 2015. Foundations of Linear and Generalized Linear Models. John Wiley & Sons.
DeGroot, Morris H, and Mark J Schervish. 2012. Probability and Statistics. Pearson Education.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Rigby, Robert A, Mikis D Stasinopoulos, Gillian Z Heller, and Fernanda De Bastiani. 2019. Distributions for Modeling Location, Scale, and Shape: Using GAMLSS in r. CRC press.
Create a front page